Reconciling Schema Matching Networks
نویسنده
چکیده
Schema matching is the process of establishing correspondences between the attributes of schemas, for the purpose of data integration. Schema matching is often performed in a pair-wise setting, in which two given schemas are matched again each other by automatic tools. In this thesis, we instead approach the schema matching problem in a network setting, in which the two schemas to be matched do not exist in isolation but participate in a larger matching network and connect to several other schemas at the same time, coined the term schema matching network. The notion of schema matching network is novel in its own right and is beneficial in many real world scenarios, including large enterprises and mashup applications. There is a large body of work on schema matching techniques; numerous commercial and academic schema matching tools, called matchers, have been developed in recent years. Since matchers rely on heuristic techniques, their result is inherently uncertain. Even though matchers achieve impressive performance on some datasets, they cannot be expected to yield a correct result in the general case. In practice, data integration tasks often include a postmatching reconciliation phase, in which correspondences are reviewed and validated by user(s). The process of reviewing and validating correspondences is called reconciliation that incrementally leads to identifying correct correspondences. The human reconciliation is a tedious and time-consuming task. It raises several issues in designing a burdenless interaction scheme to reduce the validation effort. Addressing these issues, we propose reconciliation methods to enable the automation and analysis of the reconciliation. In particular, we go beyond the common practice of human reconciliation in improving and validating matchings for a pair of schemas. Instead, we study the reconciliation for a schema matching network (i.e. a network of related schemas matched against each other). Having a network of multiple schemas enables the introduction of network-level integrity constraints, which should be respected during the reconciliation. The presence of such integrity constraints creates a number of dependencies between correspondences that, however, may be hard to overlook in large networks. Despite of this challenge, those dependencies create an opportunity to guide the validation work and minimize the necessary efforts by providing evidence for the matching quality. The dedicated contributions of this work are to address and overcome the issues of reconciling schema matching networks in three following settings.
منابع مشابه
An Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملMinimizing Human Effort in Reconciling Match Networks
Schema and ontology matching is a process of establishing correspondences between schema attributes and ontology concepts, for the purpose of data integration. Various commercial and academic tools have been developed to support this task. These tools provide impressive results on some datasets. However, as the matching is inherently uncertain, the developed heuristic techniques give rise to re...
متن کاملReconciling Schema Matching Networks Through Crowdsourcing
Schema matching is the process of establishing correspondences between the attributes of database schemas for data integration purposes. Although several automatic schema matching tools have been developed, their results are often incomplete or erroneous. To obtain a correct set of correspondences, usually human effort is required to validate the generated correspondences. This validation proce...
متن کاملTowards a More Scalable Schema Matching: A Novel Approach
With the development and the use of a large variety of DB schemas and ontologies, in many domains (e.g. semantic web, digital libraries, life science, etc), matching techniques are called to overcome the challenge of aligning and reconciling these different interrelated representations. Matching field is becoming a very attractive research topic. In this chapter, the authors are interested in s...
متن کاملArgumentation-based schema matching for multiple digital libraries
Purpose – Most digital libraries (DLs) are now available online. They also provide the Z39.50 standard protocol which allows computer-based systems to effectively retrieve information stored in the DLs. The major difficulty lies in inconsistency between database schemas of multiple DLs. This paper presents a system known as Argumentation-based Digital Library Search (or ADLSearch) which facilit...
متن کامل